biplotEZ

User-friendly biplots in R



Centre for Multi-Dimensional Data Visualisation (MuViSU)
muvisu@sun.ac.za



SASA 2024

Introduction

The biplotEZ package aims to provide users with EZier software to construct biplots.

. .

What is a biplot?

Visualisation of multi-dimensional data in 2 or 3 dimensions.

. .

A brief history of biplots and biplotEZ

History (1)

1971

Gabriel, K.R., The biplot graphic display of matrices with application to principal component analysis. Biometrika, 58(3), pp.453-467.

1976

Prof Niël J le Roux presents a seminar on biplots.

Photo NJ_hist

History (2)

1996

John Gower publish Biplots with David Hand.

Photo GH_hist

Prof le Roux introduces a Masters module on Biplots (Multidimensional scaling).

Rika Cilliers obtains her Masters on biplots for socio-economic progress under Prof le Roux.

History (3)

1997

SASA conference paper: S-PLUS FUNCTIONS FOR INTERACTIVE LINEAR AND NON-LINEAR BIPLOTS by SP van Blerk, NJ le Roux & S Gardner.

2001

Sugnet Gardner (Lubbe) obtains her PhD on biplots under Prof le Roux.

Photo SL_hist

History (4)

2001

Louise Wood obtains her Masters on biplots for socio-economic development under Prof le Roux.

2003

Adele Bothma obtains her Masters on biplots for school results under Prof le Roux.

2007

Idele Walters obtains her Masters on biplots for exploring the gender gap under Prof le Roux.

History (5)

2008

Ryan Wedlake obtains his Masters on robust biplots under Prof le Roux.

2009

BiplotGUI for Interactive Biplots, Anthony le Grange.

2010

André Mostert obtains his Masters on biplots in industry under Prof le Roux.

History (6)

2011

John Gower, Sugnet Lubbe and Niël le Roux publish Understanding Biplots.

Photo UB_hist

R package UBbipl developed with the book, but never published.

History (7)

2013

Hilmarie Brand obtains her Masters on PCA and CVA biplots under Prof le Roux.

2014

Opeoluwe Oyedele obtains her PhD on Partial Least Squares biplots under Sugnet Lubbe.

2015

Ruan Rossouw obtains his PhD on using biplots for multivariate process monitoring under Prof le Roux.

2016

Ben Gurr obtains his Masters on biplots for crime data under Prof le Roux.

History (8)

2019

Johané Nienkemper-Swanepoel obtains her PhD on MCA biplots under Prof le Roux and Sugnet Lubbe.

Photo JN_hist

Carel van der Merwe obtains his PhD using biplots. Carel supervises 4 Master’s projects on biplots.

  • Justin Perrang, Francesca van Niekerk, David Rodwell, Delia Sandilands

History (9)

2020

Raeesa Ganey obtains her PhD on Principal Surface Biplots under Sugnet Lubbe.

Photo RG_hist

André Mostert obtains his PhD on multidimensional scaling for identification of contributions to out of control multivariate processes under Sugnet Lubbe.

History (10)

2020

Adriaan Rowen obtains his Masters using biplots to understand black-box machine learning models.

2022

Zoë-Mae Adams obtains her Masters on biplots in sentiment classification under Johané Nienkemper-Swanepoel.

Photo ZA_hist

History (11)

2023

bipl5 for Exploding Biplots, Ruan Buys.

2024

Ruan Buys obtains his Masters on Exploding biplots under Carel van der Merwe.

Photo RB_hist

History (12)

2024

Adriaan Rowen to submit his PhD using biplots to understand black-box machine learning models.

Peter Manefeldt to submit his Masters using multidimensional scaling for interpretability of random forest models.

Photo PM_hist

The Team

Photo 1

Photo 2

Photo 3

Photo 4

Photo 5

Photo 6

Photo 7

More biplots

  • CVA biplots for two classes
  • Regression biplot
  • Spline biplot
  • Principal Coordinate Analysis biplot
  • Analysis of Distance biplot

CVA biplots for two classes

Canonical space of dimension 1.

Solve \(\mathbf{BM=WM\Lambda}\) where \(\mathbf{M} = \begin{bmatrix} \mathbf{m}_1 & \mathbf{M}_2\\ \end{bmatrix}\)

\[ \bar{\mathbf{Y}} = \bar{\mathbf{X}} \mathbf{M} = \begin{bmatrix} \bar{y}_{11} & 0 & \dots & 0 \\ \vdots & \vdots & & \vdots\\ \bar{y}_{K1} & 0 & \dots & 0 \\ \end{bmatrix} \]

\[ \mathbf{\Lambda} = diag(\lambda, 0, ..., 0) \] Total squared reconstruction error for means: \(TSREM = tr\{ (\bar{\mathbf{X}}-\hat{\bar{\mathbf{X}}})(\bar{\mathbf{X}}-\hat{\bar{\mathbf{X}}})'\} = 0\)

Total squared reconstruction error for samples: \(TSRES = tr\{ ({\mathbf{X}}-\hat{{\mathbf{X}}})({\mathbf{X}}-\hat{{\mathbf{X}}})'\} >0\)

CVA biplots for two classes

Minimise \(TSRES\) (Default option)

Alternative option: Maximise Bhattacharyya distance. For more details see

  • le Roux, N. and Gardner-Lubbe, S., 2024. A two-group canonical variate analysis biplot for an optimal display of both means and cases. Advances in Data Analysis and Classification, pp.1-28.

\[ \mathbf{M}^{-1} = \begin{bmatrix} \mathbf{m}^{(1)} \\ \mathbf{M}^{(2)}\\ \end{bmatrix} \]

\[ \mathbf{M}^{(2)}\mathbf{M}^{(2)'} = \mathbf{UDV}' \]

\[ \mathbf{M}_{opt} = \begin{bmatrix} \mathbf{m}_1 & \mathbf{M}_2\mathbf{V}\\ \end{bmatrix} \]

CVA biplots for two classes

biplot (iris[51:150,]) |> CVA (classes = iris[51:150,5]) |> means (cex=2) |>
  axes (label.dir = "Hor", label.line=c(0.8,0,0,0)) |> plot ()

Regression biplot

Any 2D representation of sample points, for example

library (MASS)
Zmat <- sammon (dist(iris[-102,1:4], method="manhattan"))$points
# Initial stress        : 0.01116
# stress after  10 iters: 0.00833, magic = 0.018
# stress after  20 iters: 0.00614, magic = 0.213
# stress after  30 iters: 0.00561, magic = 0.500
# stress after  40 iters: 0.00558, magic = 0.500

To create a biplot we need to add information on the variables.

\[ \mathbf{X}:n \times p \]

\[ \mathbf{Z}:n \times 2 \]

\[ \mathbf{X = ZB + E} \]

\[ \mathbf{B = (X'X)}^{-1}\mathbf{X'Z} \]

Regression biplot

biplot (iris[-102,]) |> regress (Zmat) |>  plot ()

Spline biplot

Are linear axes a good representation when the transformation from \(\mathbf{X}:n \times p\) to \(\mathbf{Z}:n \times 2\) is nonlinear?

Replace linear regression with splines.

Spline biplot

biplot (iris[-102,1:4]) |> regress (Zmat, axes="splines") |>  plot ()
# Calculating spline axis for variable 1 
# Calculating spline axis for variable 2 
# Calculating spline axis for variable 3 
# Calculating spline axis for variable 4

Principal Coordinate Analysis (PCO) biplot

Where the Regression biplot is constructed for any 2D representation of sample points, PCO is more specific.

  • Compute \(n \times n\) matrix of distances \(\mathbf{D}\) between samples and
  • perform classical scaling on distance matrix.

Distances should be Euclidean embeddable.

This means that the samples can be embedded in a Euclidean space so that the Euclidean distances between samples are EXACTLY those in \(\mathbf{D}\).

Flexibility: distances for mixed-type variables.

Principal Coordinate Analysis (PCO) biplot

head(mtcars, 4)
#                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4      21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag  21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710     22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
mtdf <- as.data.frame(mtcars)
mtdf$cyl <- factor(mtdf$cyl)
levels(mtdf$cyl) <- c("cyl4","cyl6","cyl8")
mtdf$vs <- factor(mtdf$vs)
levels(mtdf$vs) <- c("V","straight")
mtdf$am <- factor(mtdf$am)
levels(mtdf$am) <- c("auto","man")
mtdf$gear <- factor(mtdf$gear)
levels(mtdf$gear) <- c("3gears","4gears","5gears")
mtdf$carb <- factor(mtdf$carb)
levels(mtdf$carb) <- paste0("carb",levels(mtdf$carb))
head(mtdf, 4)
#                 mpg  cyl disp  hp drat    wt  qsec       vs   am   gear  carb
# Mazda RX4      21.0 cyl6  160 110 3.90 2.620 16.46        V  man 4gears carb4
# Mazda RX4 Wag  21.0 cyl6  160 110 3.90 2.875 17.02        V  man 4gears carb4
# Datsun 710     22.8 cyl4  108  93 3.85 2.320 18.61 straight  man 4gears carb1
# Hornet 4 Drive 21.4 cyl6  258 110 3.08 3.215 19.44 straight auto 3gears carb1

Distance calculations

\[ \mathbf{D} = \mathbf{D}_{cont} + \mathbf{D}_{cat} \]

Manhattan distance is not Euclidean embeddable, but its square root is:

\[ d_{ij}^{(cont)} = \sqrt{\sum_{k=1}^{p_1} |x_{ik} - x_{jk}| } \]

An example of a Euclidean embeddable distance metric for nominal data is the extended matching coefficient: counting the number of mismatches among the variables.

Care has to be taken in terms of scaling the data.

  • A mismatch has maximum value 1 for each nominal variable.
  • Distance metric for continuous variables can taken on any positive value.

Scale data matrix to unit range for each numerical variable.

Principal Coordinate Analysis (PCO) biplot

out <- biplot (mtdf, scaled=TRUE) |> 
         PCO(dist.func=sqrtManhattan, dist.func.cat = extended.matching.coefficient) |> 
         plot()
out$CLPs
# $cyl
#            [,1]        [,2]        [,3]         [,4]        [,5]        [,6]
# cyl4 -0.1576990  0.01746158  0.10914860 -0.007384645 -0.01081132  0.10408388
# cyl6 -0.0361361  0.04091779 -0.30845799  0.061155842  0.07565951 -0.21383080
# cyl8  0.1419744 -0.03417871  0.06846939 -0.024775700 -0.02933515  0.02513521
#             [,7]        [,8]        [,9]       [,10]       [,11]       [,12]
# cyl4  0.10295686  0.09715798  0.02955071  0.02270304 -0.07250053 -0.04033517
# cyl6  0.02602324 -0.19396640 -0.11161150  0.06726009  0.15668971 -0.02208976
# cyl8 -0.09390629  0.02064479  0.03258733 -0.05146815 -0.02138016  0.04273680
#             [,13]       [,14]       [,15]        [,16]       [,17]       [,18]
# cyl4  0.003767696  0.01289741  0.02559460  0.006414443  0.01365358 -0.00244133
# cyl6  0.028670687  0.04892620 -0.07596851  0.030730707 -0.05825319 -0.02722437
# cyl8 -0.017295676 -0.03459678  0.01787421 -0.020405273  0.01839878  0.01553037
#            [,19]       [,20]       [,21]        [,22]        [,23]       [,24]
# cyl4 -0.01660534 -0.06236104  0.06151852 -0.066461608  0.005908796 -0.01072738
# cyl6 -0.06243720  0.04132410  0.04263103  0.092553574  0.030019417  0.05787527
# cyl8  0.04426565  0.02833591 -0.06965149  0.005943048 -0.019652334 -0.02050898
#             [,25]        [,26]        [,27]        [,28]       [,29]
# cyl4 -0.008833978 -0.020486649  0.008380403 -0.012352426 -0.02542859
# cyl6  0.029802095  0.027768242  0.004333248  0.008872777  0.01443634
# cyl8 -0.007960065  0.002212532 -0.008751226  0.005269089  0.01276144
#             [,30]       [,31]
# cyl4 -0.027345797 -0.01591496
# cyl6 -0.002601609  0.05275984
# cyl8  0.022786788 -0.01387531
# 
# $vs
#                [,1]       [,2]         [,3]        [,4]       [,5]        [,6]
# V         0.1210856 -0.1229671  0.001484353 -0.05860577  0.1060289 -0.04545602
# straight -0.1556814  0.1581006 -0.001908453  0.07535028 -0.1363229  0.05844346
#                 [,7]        [,8]       [,9]       [,10]      [,11]       [,12]
# V         0.01768239  0.06791988  0.0999510 -0.09504417  0.1199845 -0.04532229
# straight -0.02273450 -0.08732556 -0.1285084  0.12219965 -0.1542657  0.05827152
#                [,13]       [,14]       [,15]       [,16]       [,17]
# V        -0.09795101  0.06264533  0.01074659  0.03963688 -0.02508510
# straight  0.12593701 -0.08054399 -0.01381704 -0.05096170  0.03225227
#                 [,18]      [,19]       [,20]       [,21]        [,22]
# V         0.002846486  0.0259413  0.05476036 -0.01530882  0.002740039
# straight -0.003659767 -0.0333531 -0.07040617  0.01968277 -0.003522908
#                [,23]        [,24]       [,25]       [,26]       [,27]
# V        -0.01912753 -0.007003177  0.02817584  0.05246427 -0.01111300
# straight  0.02459254  0.009004084 -0.03622608 -0.06745406  0.01428815
#                [,28]       [,29]       [,30]       [,31]
# V        -0.02292320  0.02634642 -0.04044291  0.01597322
# straight  0.02947269 -0.03387397  0.05199802 -0.02053700
# 
# $am
#             [,1]       [,2]        [,3]         [,4]        [,5]       [,6]
# auto  0.08773296  0.1773496 -0.02454688  0.005333500 -0.01126537  0.1195963
# man  -0.12822510 -0.2592033  0.03587622 -0.007795116  0.01646477 -0.1747945
#             [,7]        [,8]        [,9]      [,10]       [,11]       [,12]
# auto -0.01089476 -0.02266337  0.06803535  0.1248348  0.02164351 -0.06267158
# man   0.01592312  0.03312338 -0.09943628 -0.1824509 -0.03163283  0.09159693
#            [,13]        [,14]       [,15]       [,16]       [,17]         [,18]
# auto -0.01032808  0.001101037  0.03550550  0.07650491 -0.01174887 -0.0006033242
# man   0.01509488 -0.001609208 -0.05189265 -0.11181487  0.01717142  0.0008817816
#              [,19]       [,20]       [,21]       [,22]       [,23]       [,24]
# auto -0.0006232281  0.07383008 -0.01330168  0.01277073 -0.01429343 -0.02392875
# man   0.0009108719 -0.10790550  0.01944092 -0.01866492  0.02089040  0.03497279
#            [,25]       [,26]        [,27]       [,28]       [,29]       [,30]
# auto -0.06914549 -0.04035720 -0.002583050 -0.02019206 -0.01685685  0.02353137
# man   0.10105879  0.05898359  0.003775228  0.02951147  0.02463693 -0.03439200
#             [,31]
# auto -0.000779707
# man   0.001139572
# 
# $gear
#               [,1]        [,2]        [,3]        [,4]        [,5]         [,6]
# 3gears  0.12106724  0.09770306  0.07589747 -0.07315583 -0.03081162 -0.066799492
# 4gears -0.14234025 -0.02959144 -0.08546392  0.12963956  0.06955175  0.084748422
# 5gears -0.02158512 -0.22208971 -0.02257900 -0.09166746 -0.07448934 -0.002997738
#              [,7]        [,8]        [,9]      [,10]       [,11]      [,12]
# 3gears  0.1065692 -0.06538972  0.11372501  0.0454050 -0.06556007 -0.0959802
# 4gears -0.2073524  0.07625594 -0.02478723 -0.1037316  0.05625942  0.1790518
# 5gears  0.1779382  0.01315490 -0.28168568  0.1127409  0.06165760 -0.1417836
#              [,13]       [,14]        [,15]       [,16]       [,17]
# 3gears  0.05005808  0.03640637 -0.033241734 -0.02391663  0.01116494
# 4gears -0.09349055  0.03869117  0.004675762 -0.02627964 -0.02814350
# 5gears  0.07420309 -0.20207790  0.088503372  0.13482101  0.03404958
#               [,18]       [,19]       [,20]       [,21]       [,22]       [,23]
# 3gears -0.023670614  0.06339873  0.05928754  0.04459290  0.05588570 -0.07366062
# 4gears  0.032013063 -0.04404007 -0.08078266 -0.03495257 -0.05338752  0.07010498
# 5gears -0.005819509 -0.08450003  0.01601576 -0.04989252 -0.03952707  0.05272991
#              [,24]       [,25]       [,26]         [,27]       [,28]
# 3gears -0.03408354  0.03069656  0.02791122 -0.0005542917 -0.03047939
# 4gears  0.04706861 -0.02028960 -0.05537577  0.0009342331  0.02246126
# 5gears -0.01071405 -0.04339464  0.04916819 -0.0005792842  0.03753114
#               [,29]        [,30]       [,31]
# 3gears -0.019510654  0.002068574 -0.01041160
# 4gears  0.025131019  0.021957526  0.01220677
# 5gears -0.001782483 -0.058903784  0.00193854
# 
# $carb
#               [,1]         [,2]        [,3]        [,4]        [,5]        [,6]
# carb1 -0.068656121  0.098230919  0.04999729  0.01218844 -0.24246602 -0.27493203
# carb2 -0.025784127  0.007698923  0.07839825 -0.25327755  0.13055248  0.28793908
# carb3  0.035117330  0.032202222  0.01393419 -0.10816562 -0.03907876 -0.15755123
# carb4  0.061987521 -0.078161486 -0.11219852  0.28519601  0.06683994 -0.03248967
# carb6  0.001885105 -0.033818016 -0.04109148 -0.04654421 -0.03133337 -0.10678645
# carb8  0.011321812 -0.045779451 -0.01268937 -0.03346266 -0.12809239 -0.05052971
#                [,7]         [,8]        [,9]        [,10]         [,11]
# carb1  0.1027967935  0.008173285  0.09826384 -0.101108199 -0.0449594755
# carb2  0.0366205831 -0.049014404 -0.13086328 -0.050696015  0.0077397731
# carb3 -0.1658976667  0.318369219 -0.01136846  0.109696405  0.0370542932
# carb4 -0.0657591017 -0.074325449  0.08744240  0.077599222  0.0004205389
# carb6  0.0701771145  0.111215126 -0.06220684  0.112424017  0.1430317791
# carb8 -0.0006764833  0.109862747 -0.15732583 -0.002787905 -0.0210814502
#             [,12]       [,13]       [,14]       [,15]       [,16]       [,17]
# carb1  0.06109524  0.01741200 -0.03232364  0.05739035  0.02569445 -0.12671869
# carb2  0.07646719  0.04343916  0.04975765  0.01410277 -0.01921447  0.05500047
# carb3 -0.12730049 -0.00896036 -0.03965589 -0.06314997 -0.03875477  0.03167578
# carb4 -0.09907006 -0.05158024 -0.03149012 -0.04205995  0.01176418  0.03707736
# carb6  0.08473892  0.08692696  0.10281840  0.12612731 -0.07093138 -0.02021973
# carb8  0.09552452 -0.10051908  0.05973948 -0.05883806  0.08183742 -0.10855505
#              [,18]         [,19]        [,20]        [,21]        [,22]
# carb1 -0.015107833 -0.0096621043  0.041781144 -0.105367720  0.087156163
# carb2  0.009616878  0.0097561072 -0.011229840  0.108002094  0.047460013
# carb3  0.018628774 -0.0008134465 -0.086674532  0.004126137 -0.065552792
# carb4  0.005565021 -0.0042415083 -0.002423526 -0.043419763 -0.093662218
# carb6 -0.133957255 -0.0713099442  0.026264399  0.003941760  0.052860935
# carb8  0.032006772  0.0862390249  0.077824845  0.075430561 -0.004273653
#             [,23]       [,24]       [,25]        [,26]        [,27]
# carb1  0.03382337  0.05596256  0.02446308 -0.007523787 -0.001517649
# carb2  0.01359275  0.02213222  0.05603528  0.005034121 -0.004516489
# carb3 -0.01799393 -0.04654135 -0.02309081 -0.003128019 -0.013482984
# carb4 -0.02911725 -0.04882877 -0.05979430  0.009421616  0.009654519
# carb6  0.02468432  0.02773048 -0.03575959 -0.032108103 -0.014411053
# carb8 -0.05222105 -0.01287883 -0.02861931 -0.050398709  0.014103244
#               [,28]        [,29]        [,30]       [,31]
# carb1 -0.0337518709 -0.060393150  0.008041559 -0.01120490
# carb2  0.0508881072  0.064196690  0.038050134  0.03225107
# carb3 -0.0544887238 -0.004621512  0.053630419  0.06554406
# carb4  0.0007741526 -0.018833936 -0.058545227 -0.04643358
# carb6 -0.0474418747 -0.009987110 -0.025087393 -0.01175113
# carb8 -0.0694514546 -0.007023844  0.012856153  0.03537835